Detecting Japanese Term Variation in Textual Corpus

نویسندگان

  • Fuyuki Yoshikane
  • Keita Tsuji
  • Kyo Kageura
  • Christian Jacquemin
چکیده

In this paper, we describe a rule-based mechanism that detects Japanese term variations from textual corpora. The system operates on the basis of meta-rules that map syntactic and morpho-syntactic variations of terms to the original forms of terms. The framework used here has been successfully applied in such languages as English and French, and we show here that this also works well in detecting Japanese term variants, once we properly take into account speci c characteristics of Japanese language. We also discuss the potential of this work for IR related applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Linguistic and Mathematical Method for Mapping Thematic Trends from Texts

We present a novel method for mapping thematic trends called "Classification by Preferential Clustered Link" (CPCL). This method clusters relevant textual units (terms) from a corpus of texts, based on meaningful linguistic relations (syntactic variations) identified amongst the units. Terms related through syntactic variations are represented in the form of a graph and are first clustered into...

متن کامل

Detecting Term Relationships to Improve Textual Document Sanitization

Nowadays, the publication of textual documents provides critical benefits to scientific research and business scenarios where information analysis plays an essential role. Nevertheless, the possible existence of identifying or confidential data in this kind of documents motivates the use of measures to sanitize sensitive information before being published, while keeping the innocuous data unmod...

متن کامل

Terminology-driven Augmentation of Bilingual Terminologies

This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...

متن کامل

FlexiTerm: a flexible term recognition method

BACKGROUND The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient...

متن کامل

Textual Enhancement across Linguistic Structures: EFL Learners' Acquisition of English Forms

The benefits of textual input enhancement in the acquisition of linguistic forms have produced mixed results in SLA literature. The present study investigates the effects of textual enhancement on adult foreign language intake of two English linguistic forms-subjunctive mood and inversion structures-to explore the role of the type of linguistic items in input enhancement studies. It also invest...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999